Local Lazy Regression: Making Use of the Neighborhood to Improve QSAR Predictions

نویسندگان

  • Rajarshi Guha
  • Debojyoti Dutta
  • Peter C. Jurs
  • Ting Chen
چکیده

Traditional quantitative structure-activity relationship (QSAR) models aim to capture global structure-activity trends present in a data set. In many situations, there may be groups of molecules which exhibit a specific set of features which relate to their activity or inactivity. Such a group of features can be said to represent a local structure-activity relationship. Traditional QSAR models may not recognize such local relationships. In this work, we investigate the use of local lazy regression (LLR), which obtains a prediction for a query molecule using its local neighborhood, rather than considering the whole data set. This modeling approach is especially useful for very large data sets because no a priori model need be built. We applied the technique to three biological data sets. In the first case, the root-mean-square error (RMSE) for an external prediction set was 0.94 log units versus 0.92 log units for the global model. However, LLR was able to characterize a specific group of anomalous molecules with much better accuracy (0.64 log units versus 0.70 log units for the global model). For the second data set, the LLR technique resulted in a decrease in RMSE from 0.36 log units to 0.31 log units for the external prediction set. In the third case, we obtained an RMSE of 2.01 log units versus 2.16 log units for the global model. In all cases, LLR led to a few observations being poorly predicted compared to the global model. We present an analysis of why this was observed and possible improvements to the local regression approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Memory-Based Methods for Regression and Classification

Memory-based learning methods operate by storing all (or most) of the training data and deferring analysis of that data until "run time" (i.e., when a query is presented and a decision or prediction must be made). When a query is received, these methods generally answer the query by retrieving and analyzing a small subset of the training data-namely, data in the immediate neighborhood of the qu...

متن کامل

An Ensemble Approach to Instance-Based Regression Using Stretched Neighborhoods

Instance-based regression methods generate solutions from prior solutions within a neighborhood of the input query. Their performance depends on both neighborhood selection criteria and on the method for generating new solutions from the values of prior instances. This paper proposes a new approach to addressing both problems, in which solutions are generated by an ensemble of solutions of loca...

متن کامل

Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering

The Basel II Accord pointed out benefits of credit risk management through internal models to estimate Probability of Default (PD). Banks use default predictions to estimate the loan applicants’ PD. However, in practice, PD is not useful and banks applied credit scorecards for their decision making process. Also the competitive pressures in lending industry forced banks to use profit scorecards...

متن کامل

in silico screening of IL-1β production inhibitors using chemometric tools

The IL-1β play a major role in inflammatory disorders and IL-1β production inhibitors can be used in the treatment of inflammatory and related diseases. In this study, quantitative relationships between the structures of 46 pyridazine derivatives (inhibitors of IL-1β production) and their activities were investigated by Multiple Linear Regression (MLR) technique Stepwise Regression Method (ES-S...

متن کامل

Computational Study of Quinolone Derivatives to Improve their Therapeutic Index as Anti-malaria Agents: QSAR and QSTR

Malaria is a parasitic disease with limited chemotherapy options. Chemotherapy options are limited; moreover, drug resistant frequently occurs. The speed of drug development should be faster to overcome the emerging drug resistance. In the current study, a series of quinolone derivatives were subjected to quantitative structure activity relationship to identify the ideal physicochemical charact...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 46 4  شماره 

صفحات  -

تاریخ انتشار 2006